init commit optimistic relay #1

michaelneuder · 2022-12-29T23:47:17Z

there are a few potential paths we could take to set up an optimistic relay. below are two that i considered:

Option 1. Leave the builder status as is (HighPrio, LowPrio, Blacklisted). Modify the relay logic to only run block simulation for LowPrio builders, thus redefining HighPrio as optimistic. Thus when we call the prio-load-balancer (https://github.com/flashbots/prio-load-balancer) we only ever use the LowPrio queue.
Pros

No modification of redis KV store or postgres db schema.
Smallest amount of code change.
Cons
Reduces our flexibility. We no longer can differentiate between LowPrio and HighPrio builders who are in the pessimistic mode.

Option 2. Add a new builder status called Optimisitic. Thus we effectively have 4 builder statuses: Optimistic > HighPrio > LowPrio > Blacklisted.
Pros

More flexibility as it allows us to maintain the distinction between different priorities on the pessimistic builders.
Doesn't conflate HighPrio, which has a different semantic meaning in https://github.com/flashbots/prio-load-balancer with Optimisitic, which is a new concept we introduce.
Cons
More code.
Requires schema and redis updates.
Cognitive load to explain new hierarchy of builder statuses.

Based on the above, I propose Option 2 for its flexiblity.

initial plan for changes required to implement Option 2:

add a new builder state called Optimistic.
modify the handleSubmitNewBlock Builder API handler to skip the block simulation for blocks from builders who are in Optimistic state. these blocks will be pushed into an OptimisticBlock buffered channel where their validity will be checked asynchronously. If any of them turn out to be invalid, we will demote that block builder to LowPrio status (or could even blacklist them if we want to be conservative).
modify the handleGetHeader Proposer API handler to check for best bid block (could do this in-band or through a similar buffered channel mechanism as we do for all incoming blocks). if it turns out that the winning bid was an invalid block for one of our proposers (and they ended up publishing it), we need to reimburse them using the collateral posted by the builder of the invalid block (still need to flesh this out a bit).

open questions:

how do we ensure that we only accept optimistic blocks if the previous slot block has been validated?

JustinDrake · 2022-12-30T08:48:50Z

I propose Option 2 for its flexiblity

Agreed Option 2 is best :) A nice-to-have goal is for the optimistic relay code to be upstreamed to https://github.com/flashbots/mev-boost-relay, and for that we should avoid redefining things like HighPrio.

blombern · 2022-12-30T11:04:25Z

Great work! I also agree on option 2.

One question about point 2 of the implementation: I'm assuming the OptimisticBlock channel would still only talk to the prio-load-balancer (when enabled), and not to the block validation nodes directly?

I'm also wondering if there's a possible edge case where there are optimistic blocks buffered in memory when the service is shut down (can happen whenever), and we end up never validating them. The worst-case block validation time can reach up to 6s. 95th percentile is way lower, but the distribution has a very long right tail so it does happen. Just thinking out loud.

Excited to see this idea come to life :)

michaelneuder · 2022-12-31T21:24:30Z

Ok I added more to this implementation. I break down the changes into 3 main sequences:

a) Block Submission Flow
b) Optimistic Block Processing
c) Block Proposal Flow

I will describe each of these in more detail and the changes required for them.

Block builders submit blocks to the Builder API endpoint of the relay.
The block builder API goroutine fetches the status of the block builder from redis.
Based on the status of the builder, 3 paths can be taken.
a. if the builder is in optimistic mode, send the block to the Optimistic Block Channel.
b. if the builder is in high prio mode, send the block to the prio-load-balancer.
c. if the builder is in low prio mode, send the block to the prio-load-balancer.
If the builder in not in optimistic mode, wait until the block has been successfully simulated on the validation nodes.
Update the builders current bid in redis.

Notice that for builders in optimistic mode, we update the bid after sending the block to the Optimistic Block Channel, without validating it. This is where the improved performance can be achieved.

The OptimisticBlockProcessor receives a block from the Optimistic Block Channel (this happens in a different goroutine than the Builder API).
The OptimisticBlockProcessor sends the block as high prio to the prio-load-balancer.
The block is simulated on the validating nodes, and the status is returned to the OptimisticBlockProcessor.
If the simulation is failed, the status of the builder is updated to no longer be Optimistic (currently I am setting it as low prio).

This goroutine handles the simulation of all the blocks that we optimistically skipped in the Block Submission Flow. If we determine that a builder submitted an invalid block, we change their status and stop optimistically handling their blocks.

mev-boost calls getHeader on the Proposer API of the relay. This is part of the MEV-Block Block Proposal as documented in https://docs.flashbots.net/flashbots-mev-boost/architecture-overview/block-proposal.
mev-boost calls getPayload on the Proposer API of the relay. This triggers the publication of a SignedBeaconBlock.
In a separate goroutine, the Proposer API sends the block that was just proposed to the prio-load-balancer.
The block is simulated using the validation nodes and the status is returned to the Proposer API.
If the block simulation failed, that means we need to refund the proposer. We insert a new row into the ProposerRefundTable with the details of the bid and the SignedBlindedBlock, which contains enough information to confirm that the refund is owed.

This flow represents the process of checking if our optimistic block processing ever results in a validator submitting an invalid block. Since block builders will post collateral, this will be used to reimburse the validator. Since refunds should be a relatively rare event, we plan on handling them manually when they occur.

michaelneuder · 2023-01-02T18:56:50Z

based on the advice of @JustinDrake, I added a check for the amount of collateral from the Builder, which is set in the database and redis. If their collateral is less than a block that the propose, then we fall back to high prio queue for that block. I chose to make this state change not permanent, because they might have just found one really profitable block, and it could turn out to be correct, so we shouldn't penalize them just because their collateral can't cover the block value.

I am going to create a state machine diagram outlining the status and collateral fields of the builder in the redis cache and the db.

One open question is how we get the redis cache updated with the collatoral values we set in the DB. Still need to iron this out a bit.

michaelneuder · 2023-01-02T19:01:41Z

also an additional note about the DB. It seems like the migration code is in place so that people who are already running the relay can include upstream changes and the DB can remain in place while just changing the fields. I modified the 001-initdb file directly, which would make these changes incompatible with relays that are already running. This might be something that we want to consider refactoring to make this backwards compatible, but the migration logic would be non-trivial. (e.g., we need to update the builder status based on the values of the existing rows rather than just dropping the column and adding a new one).

…block is published.

services/api/service.go

database/database.go

services/api/service.go

JustinDrake · 2023-01-02T21:17:30Z

services/api/service.go

@@ -286,6 +298,9 @@ func (api *RelayAPI) StartServer() (err error) {
 	if api.opts.BlockBuilderAPI {
 		// Get current proposer duties blocking before starting, to have them ready
 		api.updateProposerDuties(bestSyncStatus.HeadSlot)
+
+		// TODO(mikeneuder): consider if we should use >1 optimisticBlockProcessors.
+		go api.startOptimisticBlockProcessor()


optimisticBlockProcessors are extremely lightweight, right? Why have more than 1?

i guess it depends how many optimistic blocks we get. startOptimisticBlockProcessor waits for the response of the block simulation from the prio-load-balancer, so if we only have one goroutine running this we are processing all the optimistic blocks serially. i thought it might be desirable to have a few goroutines sending the simulation requests because we have multiple validation nodes running. thoughts?

it depends how many optimistic blocks we get

We should expect to get hammered with optimistic blocks, partly because all the builders will want to connect to the ultra sound relay, but also partly because we can relax limit QoS limits (like max 2 blocks/sec).

if we only have one goroutine running this we are processing all the optimistic blocks serially

My naive understanding was that startOptimisticBlockProcessor calls simulateBlock, which calls api.blockSimRateLimiter.send, which itself uses parallelism.

that is the call path where we call api.BlockSimRateLimiter.send. that function can handle a bunch of concurrent blocks, but it still waits on the response from the simulation for the single block that we sent. so if we only have a single go startOptimisticBlockProcessor() I think we are only processing one block from that channel at a time.

e.g.,

we receive a block from the channel

we call api.BlockSimRateLimiter.send

we wait until send returns to get the simulation error

whereas we could/should have multiple go routines listening on the channel, and concurrent calls to api.BlockSimRateLimiter.send. maybe i am missing something though.

concurrent calls to api.BlockSimRateLimiter.send

Ok, agreed we should have concurrent calls to api.BlockSimRateLimiter.send :) I don't have a particularly informed opinion as to the most natural way to do it here.

services/api/service.go

datastore/redis.go

services/api/service.go

…rStatusCode and other variants

common/common.go

…e of interating with the block builder table specifically

michaelneuder · 2023-01-04T21:05:40Z

OK another rather large structural change in: 3a37075

Add a field called collateral_id to the BlockBuilder table. We will use this to identify builders that use multiple builder pubkeys, but want to have the same collateral backstopping them.
Add GetBlockBuildersFromCollateralID to the DB service which fetches all the builder pubkeys that have the same collateral id.
Add demoteBuildersCollateralID to the API service which uses a single builder pubkey to fetch all pubkeys that share a collateral id and demotes them in both redis and the db.
Call demoteBuildersCollateralID from startOptimisticBlockProcessor and handleGetPayload if the simulation fails in either case.

database/database.go

services/api/service.go

services/housekeeper/housekeeper.go

services/api/service.go

common/common.go

datastore/redis.go

services/api/service.go

database/types.go

services/api/service.go

… is being simulated

init commit optimistic relay

e57024c

michaelneuder added 3 commits December 30, 2022 23:14

optimistic relay updates

95e937a

db updates

e666a4b

flow (c) update database with new table to persist refunds

26ad103

adding collateral check and DB field

3dddf4b

michaelneuder added 2 commits January 2, 2023 13:34

remove db update

e7aa032

add block builder demotion in the case of a simulation error after a …

234ff04

…block is published.

JustinDrake reviewed Jan 2, 2023

View reviewed changes

services/api/service.go Show resolved Hide resolved

services/api/service.go Outdated Show resolved Hide resolved

services/api/service.go Show resolved Hide resolved

comments from Justin

c36ef85

JustinDrake reviewed Jan 2, 2023

View reviewed changes

services/api/service.go Outdated Show resolved Hide resolved

adding signedValidatorRegistration to the refund table

8379b4e

JustinDrake reviewed Jan 2, 2023

View reviewed changes

services/api/service.go Outdated Show resolved Hide resolved

normalize usage of BuilderStatus and builder_status instead of Builde…

442a7bf

…rStatusCode and other variants

JustinDrake reviewed Jan 2, 2023

View reviewed changes

common/common.go Outdated Show resolved Hide resolved

michaelneuder added 4 commits January 4, 2023 12:33

rename refund to demotion

c04edc8

s/TableBuidlerDemotions/TableBuilderDemotions/g

66b14f7

normalizing use of builder instead of block builder except in the cas…

6bb8bf3

…e of interating with the block builder table specifically

collateral id and demotions

3a37075

replacing $3 with $2 in SetBlockBuilderStatus

cae6c95